Stem Stem Stem Loop Loop Loop LoopLoop Loop Loop Loop Loop Loop
نویسنده
چکیده
Background: Pairwise stochastic context-free grammars (Pair SCFGs) are powerful tools for evolutionary analysis of RNA, including simultaneous RNA sequence alignment and secondary structure prediction, but the associated algorithms are intensive in both CPU and memory usage. The same problem is faced by other RNA alignment-and-folding algorithms based on Sankoff's 1985 algorithm. It is therefore desirable to constrain such algorithms, by pre-processing the sequences and using this first pass to limit the range of structures and/or alignments that can be considered. Results: We demonstrate how flexible classes of constraint can be imposed, greatly reducing the computational costs while maintaining a high quality of structural homology prediction. Any scoreattributed context-free grammar (e.g. energy-based scoring schemes, or conditionally normalized Pair SCFGs) is amenable to this treatment. It is now possible to combine independent structural and alignment constraints of unprecedented general flexibility in Pair SCFG alignment algorithms. We outline several applications to the bioinformatics of RNA sequence and structure, including Waterman-Eggert N-best alignments and progressive multiple alignment. We evaluate the performance of the algorithm on test examples from the RFAM database. Conclusion: A program, Stemloc, that implements these algorithms for efficient RNA sequence alignment and structure prediction is available under the GNU General Public License. Background As our acquaintance with RNA's diverse functional repertoire develops [1-5], so does demand for faster and more accurate tools for RNA sequence analysis. In particular, comparative genomics approaches hold great promise for RNA, due to the well-behaved basepairing correlations in an RNA gene family with conserved secondary structure (at least, well-behaved compared to protein structures). Whereas the structural signal encoded in a single RNA gene is rather weak and may be barely (if at all) distinguishable from the secondary structure of a random sequence [6], the covariation signal increases with every additional sequence considered. Many programs for comparative analysis of RNA require the sequences to be prealigned [7-9]. This can be a source of error, since misaligned bases can add noise that swamps the covariation signal. The most recent of these methods allows for some uncertainty in the alignment [7]. More generally, one can view the alignment and structure prediction as a combined problem, to be solved simultaneously. This is the approach taken in this paper, and by earlier programs such as FOLDALIGN [10], DYNALIGN [11], CARNAC [12], QRNA [9] and our dart library, introduced in a previous paper [13] and extended here. In this framework, fixing of the alignment can be viewed as Published: 24 March 2005 BMC Bioinformatics 2005, 6:73 doi:10.1186/1471-2105-6-73 Received: 30 April 2004 Accepted: 24 March 2005 This article is available from: http://www.biomedcentral.com/1471-2105/6/73 © 2005 Holmes; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Page 1 of 22 (page number not for citation purposes) BMC Bioinformatics 2005, 6:73 http://www.biomedcentral.com/1471-2105/6/73 a partial constraint on the simultaneous alignment/folding problem. A powerful, general dynamic programming algorithm for simultaneously aligning and predicting the structure of multiple RNA sequences was developed by David Sankoff [14]. The energy-based folding of Zuker et al [15] and recent approaches based on Stochastic Context-Free Grammars (SCFGs) [9,13,16-20] are both closely related to Sankoff's algorithm. The method takes time O(L3N) and memory O(L2N) for N sequences of length L. This is prohibitively expensive at the time of writing, except for fairly short sequences, which has motivated the development of various constrained versions of these algorithms [911,13,21]. The purpose of this paper is to report our progress on general pairwise constrained versions of Sankoff's algorithm (or, more precisely, constrained versions of some related dynamic programming algorithms for SCFGs). The overall aim is the simultaneous alignment and structure prediction of two RNA sequences, X and Y, subject to an SCFG-based scoring scheme and user-supplied constraints. Additionally, we wish to be able to parameterize the model automatically from training data. Without constraints, the above tasks are addressed by the resourceintensive CYK and Inside-Outside algorithms; here, we present constrained versions of these algorithms that work in reduced space and time (the exact complexity depends nontrivially on the constraints). Our system of constraints is quite general. Previous constrained versions of Sankoff-like algorithms, such as the programs DYNALIGN [11] and FOLDALIGN [10], have been restricted to "banding" the algorithm e.g. by constraining the maximum insertion/deletion distance between the two sequences or the maximum separation between paired bases. Alternately, constraints on the accessible structures [13] or alignments [9] have been described. The algorithms described here can reproduce nearly all such banding constraints and, further, can take advantage of more flexible sequence-tailored constraints. Specifically, the fold envelopes determine the subsequences of X and Y that can be considered by the algorithm, while the alignment envelope determines the permissible cutpoints in the pairwise alignment of X and Y. The fold envelopes can be used to prune the search over secondary structures (e.g. by including/excluding specific hydrogen-bonded base-pairings), while the alignment envelopes can be used to prune the search over alignments (e.g. by including/excluding specific residue-level homologies). The fold envelopes can be precalculated for each sequence individually (e.g. by an energy-based folding or a single-sequence SCFG), and the alignment envelope by comparing the two sequences without regard for secondary structure (e.g. using a pairwise Hidden Markov Model); both types of pre-comparison are much more resource-friendly than the unconstrained Sankoff-like algorithms. The design of the constrained algorithms is discussed using concepts from object-oriented programming: the dynamic programming matrix can be viewed as a sparsely populated container, whereas the main loop that fills the matrix is a complex iterator [22]. The algorithms have been implemented in a freely available program for RNA sequence alignment, stemloc, which also includes algorithms to determine appropriate constraints in an automatic fashion. Results demonstrating the program's efficient resource usage are presented. The stemloc program also implements various familiar extensions to pairwise alignment, including local alignment [23], Waterman-Eggert N-best suboptimal alignments [24] and progressive multiple alignment [25]. Although the envelope framework, rather than these extensions, is the main focus of this paper, implementation of the extensions is straightforward within this framework, and is briefly described. Results To investigate the comparative resource usage of the various different kinds of constraint that can be applied using fold and alignment envelopes, stemloc was tested on 22 pairwise alignments taken from version 6.1 of RFAM [37], spanning 7 different families of functional noncoding RNA. Each chosen test family had a consensus secondary structure published independently in the literature, and no two sequences in the test set had higher than 60% identity. The EMBL accession numbers and co-ordinates of all sequences are listed in Table 5. The table shows the performance of stemloc using the 1000-best fold envelope and the 100-best alignment envelope. The various RFAM families are S15, the ribosomal S15 leader sequence; the U3 and U5 spliceosomal small nucleolar RNAs; IRE, the iron response element from UTRs of genes involved in vertebrate iron metabolism; glmS, the glucosamine-6phosphate activated mRNA-cleaving ribozyme; Purine, the prokaryotic purine-binding riboswitch; and 6S, the E.coli polymerase-associated transcriptional repressor. The following three test regimes were used, each representing a different combination of fold and alignment envelopes: N-best alignments, all folds The alignment envelope containing the N best primary sequence alignments, with the unconstrained fold Page 2 of 22 (page number not for citation purposes) BMC Bioinformatics 2005, 6:73 http://www.biomedcentral.com/1471-2105/6/73 envelopes (stemloc options: '--nalign N --nfold -1'). This is the red curve in Figures 8, 9, 10, 11, 12, 13 N-best folds, all alignments The unconstrained alignment envelope, with the fold envelopes containing the N best single-sequence structure predictions (stemloc options: '--nalign -1 --nfold N'). This is the green curve in Figures 8, 9, 10, 11, 12, 13 N-best folds, 100-best alignments The alignment envelope containing the 100 best primary sequence alignments, with the fold envelopes containing A parse tree for the grammar of Table 1 Figu e 1 A parse tree for the grammar of Table 1. Each internal node is labeled with a nonterminal (Stem or Loop); additionally, the subsequences (Xij, Ykl) generated by each internal node are shown. The parse tree determines both the structure and alignment of the two sequences. The cut-points of the alignment are the sequence co-ordinates at which the alignment can be split, i.e. {(0, 0), (1, 1), (2, 2) ... (15, 12), (16, 13), (17, 14)}. Stem
منابع مشابه
Comparing outcomes of reconstruction of anterior cruciate ligament rupture with fixed loop and adjustable loop
Background: Recently, the surgical methods are used in patients with anterior cruciate ligament rupture and have been associated with successful results. There are different results in the term of using of the surgical methods for anterior cruciate ligament that often is associated with some complications such as infection, static laxity, remaining the pain, need to recurrence surgery, and limi...
متن کاملبررسی شکل منحنی جریان و حجم در ضایعات انسدادی کارینا و برونش
Spirometry and flow - volume loop show the abnormal pattern of pulmonary dysfunction and the site of the obstruction of upper and peripheral airways. For determinig the shape of flow - volume loop in obstruction of carina and bronchus we examined 19 patients with large airway obstruction documented with fiberoptic bronchoscopy via pulmonary function testing and flow volume loop performing. Two...
متن کاملThe Design of a New Double Loop Controller For Simultaneous Adjustment of Input and Output Voltages of Single-Phase Grid-Connected Inverter
Although LCL filters are used widely in the grid connected inverters to reduce high-orderharmonics, such a system increases system order and therefore sustainable design of closed-loopcontroller system will be complicated. Recently, the single-loop control strategy has beensuggested for L or LC filter based grid-connected inverters. However, the use of single-loopcontrol directly in LCL filter-...
متن کاملControl of Flexible Link Robot using a Closed Loop Input-Shaping Approach
This paper is has addressed the Single Flexible Link Robot. The dynamical model is derived using Euler-Lagrange equation and then a proper controller is designed to suppress a vibration based-on Input-Shaping (IS) method. But, IS control method is an open loop strategy. Due to the weakness of open loop control systems, a closed loop IS control system is proposed. The achieved closed loop c...
متن کاملبرانگیختگی و میرایی نوسانات عرضی در حلقههای تاج توسط پدیده ویک
Transversal oscillation of coronal loops that are interpreted as signatures of magneto hydrodynamics (MHD) waves are observed frequently in active region corona loops. The amplitude of this oscillation has been found to be strongly attenuated. The damping of transverse oscillation may be produced by the dissipation mechanism and the wake of the traveling disturbance. The damping of transversal ...
متن کاملبررسی تنوع ناحیه D-loop در DNA میتوکندریایی شترهای تککوهانه و دوکوهانه ایرانی
هدف از انجام مطالعه حاضر، بررسی تنوع ناحیه D-loop در DNA میتوکندریایی شترهای تککوهانه و دوکوهانه ایرانی بود. بررسی و کاوش درباره شترها در سطح ژنومی میتواند به شناخت هرچه بهتر و حفظ آگاهانه آنها کمک کند. در تحقیق حاضر از تعداد 45 نفر شتر تککوهانه که از مناطق مختلفی (ایستگاه پرورش شتر یزد، ایستگاه طرود و گلههای سمنان) انتخاب شده بودند به همراه 29 شتر دوکوهانه که از استان اردبیل گزینش شده بود...
متن کامل